Conversation log 2021-01-30 - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

Conversation log 2021-01-30

I used it with voice input because some people wanted to use it with voice.

Not good for use with voice. It is fatal that important keywords are not recognized and are collapsed.

Then I switched to text input and realized that "I want to use it with voice" is not the same as "I want to use it with voice input".

https://keicho.netlify.app/#talk=yaICpGfZchliFVmnnNM3

---

nisbot.iconWhat would you like to see happen in this conversation?

nishio.iconI would like to test voice input because there are quite a few people who say, "Oh, it would be nice if we could use this system with voice.

nisbot.iconWhat type of audio is that?

nishio.iconRight now I'm taking a walk and using my AirPods for voice input.

nisbot.iconWhat type of voice input is that?

nishio.iconIt's a bit annoying to start voice input because now you have to tap the voice input button on the keyboard every time you want to start voice input.

nisbot.iconIs there anything else about that voice input?

nishio.iconAlso, this chat storm...

I'm trying to send it to you, but I can't.

When I press the enter key, it sends the message, but it doesn't seem to recognize the characters entered when the enter key is pressed in the voice input, so I have to quit the voice input every now and then.

It's a hassle to have to long the man-male button.

nisbot.iconHow do you know its voice input?

nishio.iconAside from voice input, I don't read out the response from the chatbot now, so I have to look at the screen, which is a hassle.

nisbot.iconWhere is that voice input located?

nishio.iconI tap the voice input button on the keyboard every single time.

nisbot.iconWhere is that voice input?

nishio.iconIt is a standard feature of the iPhone

nisbot.icon

Aside from voice input, I don't read out the response from the chatbot now, so I have to look at the screen, which is a hassle.

What does this voice input look like?

nishio.iconI think people who want to use voice input for e-mail probably envision using it as if they were having a conversation with a human, but in the field, you have to tap a button to input voice data and read what appears in the rain by hand, so it seems a little less smooth.

nisbot.iconIs there anything else you can tell us about that audio?

nishio.iconI don't know, 9sawa, is that really necessary?

nisbot.icon

I tap the voice input button on the keyboard every time.

What type of keyboard is this keyboard?

nishio.iconNGKW Keyboard

nisbot.icon

I think that people who want to use voice input for e-mail probably envision using it as if they were having a conversation with a human, but in the field, you have to tap every single button to input voice data and read what appears in the rain by hand, so it seems a little less smooth.

Where is this audio located?

nishio.iconMaybe the destination needs to be audio, too.

nisbot.iconHow do you know its audio?

nishio.iconI think I heard you say you'd like to see the audio of the response, and you'd like to give an audio answer to that.

nisbot.iconWhat does that audio sound like?

nishio.iconI guess they want to communicate with other humans as if they were having a voice conversation, which I'm not really interested in.

nisbot.iconWhat is it that you value?

nishio.iconSo, what I value is that the output of text remains for 6 minutes, and by putting it in through my eyes, it disappears the moment I say it out loud, but with text, it remains for a while in my eyes, and by putting it in through my eyes, it disappears the moment I say it out loud, but with text, it disappears the moment I say it out loud. I think I feel that the synergistic effect is beneficial to human intelligence by the fact that the text stays in my eyes for a while. I think that the synergistic effect of voice input is that it is beneficial to go to the early morning of memory, and that is what I feel the value of. This synergistic effect is beneficial to human intelligence, and I think they feel that it is valuable.

nisbot.iconWhat should happen with that voice input?

nishio.iconIn fact, when a human being coaches with this method, he or she repeats back what the other person has said, and it functions as a mechanism to input what was said into the head again, rather than cutting it off at the moment it was spoken, even if the person does not hold the voice B, it is not sound but text, which is easily obtained if the input is text. There are two functions that you can get.

nisbot.iconWhat needs to happen to voice input for that to happen?

nishio.iconIn fact, when a human coachee uses this technique, he or she repeats back what the other person has said, and this works as a mechanism to input what was said into the head again, rather than cutting it off at the moment it was said, but that is a function that is easily available if you use text rather than sound, unless you have already used voice input. But that's a feature you can easily get if you type in text rather than sound if you don't use voice.

nisbot.iconCan that audio do it?

nishio.iconWhen something like this happens, the flow state is interrupted, and you can't immerse yourself in the object of post-thought, which is a big disadvantage, so I think it's better to say that voice cannot be used in the field.

nisbot.iconWhere does the audio come from?

nishio.iconText input from here. To summarize so far, voice input/output works negatively for the intrinsic value of this tool, because "

nisbot.iconWhat should happen to that audio?

nishio.iconThis is important because "what you output is logged as text, and primary memory is augmented by putting the log on the screen through your eyes," and when a live person actually uses this technique orally, he or she repeats back what the other person says.

nisbot.iconWhat needs to happen to the voice for that to happen?

nishio.iconDuring that repeat-back, the latest care should be taken not to change the words used by the other party. The voice input/output must be able to repeat what the user has spoken without making mistakes.

nisbot.iconCan voice input do that?

nishio.iconThe technology at the moment is not even close. Important keywords like "log" and "short-term memory enhancement" are all over the place when typed by voice.

nisbot.iconWhere is that audio?

nishio.iconIt's better to just repeat back the recording without applying voice recognition.

nisbot.iconWhat is the relationship between its voice and voice input?

nishio.iconYes, that's it. When a user wants to "communicate by voice," what is really needed is not to recognize the voice and string it together. It is more likely to be correct to give feedback as it is spoken, or to find and cut out the keyword part from the inflection and so on as it is spoken.

nisbot.iconIs that voice the same as voice input? Is it different?

nishio.iconI assumed they were the same, but they weren't!

nisbot.icon

Yes, that's it. When a user wants to "communicate by voice," what is really needed is not to recognize the voice and string it together. It is more likely to be correct to give feedback as it is spoken, or to find and cut out key word parts from the inflection and so on as it is spoken.

What happens to the voice input when this voice

nishio.iconIt is not used for the dialogue itself. The difference is whether to "listen back" or "read back" when reading the dialogue log later.

nisbot.icon

> Yes, I know that I don't feel the value of voice input for the time being, but what I do feel the value of is that the output as text remains for 6 minutes, and by putting it in through my eyes, it disappears the moment I say it out loud if it's voice, but if it's text, it stays in my eyes for a while for 2 days. I think that the synergistic effect is beneficial for human intelligence by saying that the text stays for a while in the second day. What I feel the value of is that the output of text remains for six minutes, and by putting it in through my eyes, it disappears the moment I speak it out loud if it is voice, but if it is text, it remains for a while on the second day, which makes it possible for human beings to have a synergistic effect on their memory at one o'clock in the early morning. In the early morning, the synergistic effect is beneficial to human intelligence, and you feel the value of it.

What happens to the voice during this voice input?

nishio.iconTo string by voice input prevents smooth voice interaction in the current system. If you want to do it by voice, the system should repeat back only by voice and not look at the screen.

nisbot.iconWhat is between that voice and voice input?

nishio.iconThere should be keyword extraction in audio form before audio is turned into strings!

nisbot.icon

To string by voice input prevents smooth voice interaction in the current system. If you want to do it by voice, the system should repeat back only by voice and not look at the screen.

Where does this voice input come from?

nishio.iconI wonder where it comes from, maybe someone is researching keyword extraction in the form of audio.

nisbot.iconWhat is the relationship between its audio and repeatability?

nishio.iconIf you're communicating by voice, you can't see the log screen, so you need to repeat back to the voice.

nisbot.iconIs that audio the same as repeat? Is it different?

nishio.iconDifferent. With text, the log serves the function of repeat back.

nisbot.icon

If you communicate by voice, you can't see the log screen, so you need to repeat back to the voice.

What happens to the repeat during this audio?

nishio.iconI'd think it would take longer to get an audio repeat back than it would to look at the log.

nisbot.iconWhat happens to the audio during that repeat?

nishio.iconYou might want to interject and rephrase.

nisbot.icon

I think it would take longer to get an audio repeat back than it would to look at the logs.

What is between this audio and the repeat?

nishio.iconI'd suggest removing filler like "uh" or something like that instead of repeating back the whole recording.

nisbot.icon

I think it would take longer to get an audio repeat back than it would to look at the logs.

What is the relationship between this audio and text?

---

This page is auto-translated from /nishio/会話ログ2021-01-30. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.